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Abstract 

We study the mechanism behind dynamical trappings experienced during Wang-Landau sam¬ 
pling of continuous systems reported by several authors. Trapping is caused by the random walker 
coming close to a local energy extremum, although the mechanism is different from that of critical 
slowing down encountered in conventional molecular dynamics or Monte Carlo simulations. When 
trapped, the random walker misses entire or even several stages of Wang-Landau modification fac¬ 
tor reduction, leading to inadequate sampling of configuration space and a rough density-of-states 
even though the modification factor has been reduced to very small values. Trapping is depen¬ 
dent on specific systems, the choice of energy bins, and Monte Carlo step size, making it highly 
unpredictable. A general, simple, and effective solution is proposed where the configurations of 
multiple parallel Wang-Landau trajectories are inter-swapped to prevent trapping. We also explain 
why swapping frees the random walker from such traps. The efficacy of the proposed algorithm is 
demonstrated. 
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I. INTRODUCTION 


Wang-Landau sampling (WLS) [TJ [2j is increasingly becoming an important tool for 
simulating a wide range of slow-relaxing, glassy systems which are not tractable by more 
conventional techniques such as molecular dynamics (MD) or Monte Carlo (MC) sampling 
using Metropolis algorithm. To date, WLS has been applied to a wide variety of systems 
ranging from random bond and random field systems M, atomic fluid [3] and clusters [6], 
liquid crystals [7], polymers [SI El E] and proteins [TU1 - H2] . to lipid membranes in explicit 
solvent PS HJ. In MD and MC, one frequently encounters the problem that the system 
gets trapped in some local minimum due to the presence of large barriers on the energy 
landscape [11 ! , [15, 13 [29]. WLS (and related entropic sampling techniques PZHIH]), on 
the other hand, circumvents this problem by sampling from the density-of-states (DOS) 
instead of the Boltzmann distribution. Since there are no energy barriers to overcome, the 
system readily moves back and forth between the high and low energy regions of phase 
space, thereby ensuring efficient sampling. Due to the effectiveness of such multicanonical 
techniques, related methods such as Replica-Exchange Multicanonical Algorithm [2QH23] 
and Statistical-Temperature Monte Carlo [ 2TI [25 ] are very efficient for simulating systems 
with rough energy landscapes. 


Although WLS is initially believed to be free from the problem of slow relaxation, there 
is increasing evidence that WLS is subjected to another kind of slowing down when applied 
to continuous systems. In an early study, Brown and Schultess [26] reported that for the 
ferromagnetic Heisenberg model, WLS is expensive when sampling from rare configurations 
close to the ground state. Jayasri et al. [7], in their study of liquid crystals, reported that 
WLS becomes extremely slow even for moderately large systems because the system often 
gets stuck in certain regions of configuration space. In their study of polypeptides and 
Lenard-Jones clusters using WLS, Poulain et al. [6] also reported similar trappings of their 
systems. 
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II. MODELS AND THE WANG-LANDAU METHOD 


To examine the nature of these trappings, we consider two systems. The first is the 
two-dimensional L x L square-lattice, frustrated XY model, 

H = cos(0i-0j), ( 1 ) 

(O') 

where 6 r e (— tt, tt) ( i = 1,...,L 2 ) are XY spins, (i,j) denotes summation over nearest- 
neighbor pairs. Periodic boundary condition is used for both lattice dimensions. J %3 are 
identical independent random variables sampled from a gaussian distribution with zero mean 
and unit variance. Single-spin MC moves are used. The second system is an 8-mer poly¬ 
alanine molecule in a medium with dielectric constant e = 2. We used the Amber99-bs0 
force-field m and simulated using a modified version of the MOSAICS package [28]. Torsion 
angle moves were used. We choose these two systems as illustrations because they are simple 
enough but yet possess rough energy landscapes with many local minima. WLS studies of 
the poly-alanine molecule have also been reported by Poulain et ah [6]. 

Simulations were performed using the standard WLS algorithm [T, j2]J. In WLS, the 
quantity of interest is the density-of-states g{E), or the number of all possible states at the 
energy E. Once g(E) is known, the partition function can be calculated, 

Z= e~ E/kBT = ^g{E)e~ E/kBT , (2) 

{configurations} E 

where ks is the Boltzmann constant and T is the temperature, and most thermodynamic 
quantities can be obtained from Z. WLS estimates g(E) via a random walk in energy 
space. Trials for a state with energy E t to one with energy Ef are accepted according to the 
transition probability, 

p(E, -> E,) = min (l, • (3) 

If g(E) is the true DOS, Eq. (|3]) obeys detailed balance. However, g(E) is initially unknown, 
and one makes a guess for it. Whenever an energy E is visited according to the transition 
rule Eq. ([ 3 ]), the DOS is updated as g(E) / • g(E) where / is called the modification 
factor. The role of the modification factor is to make gradual refinement of the initially 
inaccurate DOS. One begins the simulation with a large / (usually / = /o = e 1 ~ 2.71828) 
and gradually lowers it, in stages, to 1. When / = 1 is reached, detailed balance for Eq. 


3 



(J3| is recovered, and one obtains the true DOS for g(E). In practice, one works with 
In g(E) to prevent numerical overflow, and the update rule is In g(E) In g(E) + In/. In 
our simulations, we start from In f 0 = 1 and lowers the modihcation factor according to 
ln/j + i | In upon completing the ith stage of simulation. 

In the original formulation of WLS, one computes an energy histogram and uses its 
‘flatness’ as a criterion to proceed to the next stage of simulation. However, as this criterion 
has been found to be quite arbitrary by several subsequent studies [29H33] . we use the 
criterion discussed in [33j to lower In/. Another modification we adopted here is the one 
suggested by Zhou et al. [Mj that for continuous systems the DOS should be linearly 
interpolated if the energy falls between the centers of two energy bins. Hence in our analysis 
the DOS is piecewise linear. The case of piecewise constant DOS will be discussed at the 
end of this paper. 


III. TRAPPINGS DURING WANG-LANDAU SAMPLING 

Figs. [IJa) and (b) show examples of the DOS where the system exhibits signs of being 
trapped for the XY model and poly-alanine molecule, respectively. The main feature is 
the high spikes, which are formed because the random walker stayed for an inordinate 
amount of time in a single energy bin, resulting in most of the modihcation factor In / 
being accumulated there. We have checked that the system is capable of proposing new 
configurations with fairly large changes in energy with the MC step size we have used (c.f. 
Fig. [3]^b) and discussion below). Nevertheless, during the period of time when the random 
walker is stuck in a spiked bin, we found that all its MC moves result in small energy changes. 
The reason is not in a choice of small MC step size, but is due to the fact that the random 
walker has wandered close to a stationary point in configuration space. To ascertain that 
the random walker is close to a stationary point, we computed its energy gradient and found 
it to be very small indeed. 

Fig. 0 is a schematic diagram illustrating the mechanism of trapping and how a spike 
develops. Suppose, as shown in the inset, the system is at a local minimum with configuration 
C and energy E. Let E lie within the bin b as shown. As the random walker is at a 
local minimum, the proposed energy E' must be > E, and the change in energy is small 
because the first derivative vanishes. Due to the steepness of the DOS, the acceptance 
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ratio exp [In g(E) — In g(E')\ is very small, the move is mostly rejected, and In / is added 
to the bin b. This process repeats itself, with the DOS at bin b constantly increasing but 
the configuration hardly changing. If the system can propose a move to an energy, say E" 
or higher, then the move may be accepted and the random walker can escape the trap. 
But this is highly improbable due to the vanishing of the gradient. The situation becomes 
more serious as system size increases because one usually increases the bin width in order to 
cover a wider energy range and reduce computational cost; however, the change in energy 
produced by single-spin or set of small torsion angle moves remains similar, and therefore 
becomes smaller relative to the increasing bin width. 

Figs. [3^a)- (c) illustrate, for the XY model (L = 16), typical behavior of a trapped 
trajectory, compared against ‘normal’ (i.e., not trapped) behavior. The trapped segment is 
taken from a continuous period of time the random walker is inside a spiked bin (k8x 10 5 
steps in this case). The normal segment is of similar length and taken from along the same 
trajectory but starting at a slightly earlier time before the random walker gets trapped. To 
monitor the configuration of the system, we computed the autocorrelation function, 

L 2 

C(t) = -Jj ^2 cos [0i(O) - 0i(r)], (4) 

i=l 

where r denotes the number of MC steps that have elapsed since the start of the segment. 
The results are shown in Fig. [3j^a) . For the trapped segment, C{r) remains close to unity 
throughout, meaning that its configuration hardly changes; for the normal segment, C(t) 
decays to zero very quickly. The inset in (a) plots the component of the force dH/09 ; with the 
largest magnitude (i.e. infinity norm) versus r. The force on the trapped segment is almost 
zero throughout. Fig. [3](b) is a scattered plot showing the proposed energy change A E versus 
the proposed angle change A 9. As mentioned above, with the MC step size we are using, 
the random walker is able to, under normal conditions, propose new configurations with 
fairly large changes in energy, as shown by the large dispersion of blue points (i.e. normal 
segment). However, when trapped, A E is very small compared to the normal segment, and 
is distributed asymmetrically with very few energy-lowering moves. The acceptance rate is 
~1% and ~60% for the trapped and normal segments, respectively. The energy distribution 
of the accepted moves are shown in Fig. [3j[c) . For the trapped segment, the histogram is 
very narrow and asymmetric, with almost only energy-lowering moves being accepted. We 
also examined other cases of trappings, both for the XY model (varying over a range of MC 
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step sizes) and the alanine peptide molecule, and found the above behavior to be general in 
both systems. 

Spikes are usually formed during the early stages of the simulation when the DOS is still 
rough. A small random bump on the DOS like the one in Fig. [2] can serve as a ‘seed’ for 
spike growth if the random walker accidentally visits a local minimum (or maximum, where 
similar arguments apply) whose energy happens to lie within the same energy bin as the 
bump. 

Spikes do not necessarily have to form in the same bin everytime. This is because there 
are other local minima/maxima that reside in other energy bins. In addition, as shown in 
the inset of Fig. [2j the random walker can sample other configurations, such as D, with the 
same energy E. This means that, spiking does not always occur in an energy bin where a 
spike was previously observed. Therefore spiking events are highly unpredictable. 

A large amount of simulation time is wasted when the random walker is trapped because 
it spends less time sampling other parts of configuration space. When a spike is discovered 
at the end of one’s simulation one should discard the results knowing it suffers from bad 
sampling, as shown later in our calculation of specific heat capacity. A simple fix to the 
problem would be to vary the energy bin width and/or sampling step size. Unfortunately, it 
is not possible to know a priori the appropriate bin width and step size to choose to avoid 
spiking (if at all possible), unless inordinately large step sizes are used in conjunction with 
small bin widths. Even then, this is not a feasible general solution, as large moves make it 
difficult to sample low energy basins, while using small bin widths is impractical, especially 
for systems with large energy ranges. 


IV. PROPOSED SOLUTION: TRAJECTORY-SWAPPING 

Here, we introduce a general solution. We propose running multiple trajectories of WLS 
in parallel, each starting from a different random seed (or initial condition). Periodically, 
one randomly pairs the configuration of each trajectory with a new DOS from another 
trajectory. If the random walker happens to be trapped in a spike, this swapping resets the 
configuration to a ‘safe’ region of energy space and stops the spike from growing further. 
The idea is similar to that of replica-exchange Monte Carlo [35], where a low temperature 
state trapped in a local minimum is swapped with a higher temperature one, enabling it 
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to escape and sample other regions of configuration space. Swapping between different WL 
random walkers also ensures that the sampling is uniform in energy space. Our method is 
simple to implement and completely general, applicable to any energy binning or sampling 
step size. The DOS from all the trajectories can be averaged at the end of the simulation 
so no computational resources are wasted. 

Recently, Vogel et al. [13] proposed a parallel WL method to extend WLS to large-scale 
problems. In their method, different random walkers in overlapping energy windows are run 
in parallel and occasionally exchanged so as to allow the entire energy range to be sampled 
simultaneously and quickly. The purpose of our current work, on the other hand, is to 
use parallelization and swapping as a way to prevent trapping. In addition, each of our 
random walker samples the entire energy range, and we do not define a separate transition 
probability for the exchange of two random walkers. The method of Vogel et al. may also 
be effective in spike prevention. However, the general focus and actual implementation of 
the two works are different. 

We tested our algorithm on the frustrated XY model. Let 0" = (0“, • • • , 6 ^ 2 ) and g a (E) 
denote, respectively, the configuration and DOS of the ath trajectory, where a — 1, • • • , N 
denotes the random seed of each trajectory. Every (T x L 2 )th MC step, a shuffling algorithm 
randomly permutates the trajectory index a —>■ a ', and assign O a ' as the new configuration of 
g a (E). The main factors affecting the efficacy of the algorithm are the time interval between 
swaps T x L 2 and the total number of trajectories N. Swapping should be as frequent as 
possible so that spikes do not have the chance to build up. Using more trajectories reduces 
the probability of swapping two trajectories which are simultaneously trapped in the same 
energy bin. On the other hand, as we implemented the algorithm using message passing 
interface, the swapping process requires communication between different processors and 
incurs computational overhead, so one should also try to keep T large and N small. 

As our swapping between trajectories is accepted with probability 1 , strictly speaking, 
it violates the transition rule of WLS. However, as the number of swaps we performed is 
very small compared to the entire length of the trajectory, no serious error is introduced, as 
shown by our numerical experiments below. 
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V. NUMERICAL SIMULATIONS 


In our simulations, we consider one set of ./ ?J for each L [36j, and study the effects of 
T , N, and random shuffling on the prevention of spike occurrences. The energy binning for 
the DOS is chosen such that the bins are narrow enough to support the energy distribution 
g(E)e~P E near zero temperature. Such narrow bin widths easily induce spiking in the 
original WLS algorithm. Admittedly, the original WLS might not induce spiking for other 
choices of bin widths, but we did not perform an exhaustive study on this issue, due to the 
unpredictability of spiking, as previously discussed. All N trajectories are launched from 
the same initial configuration. Each In / stage is simulated for 5 x 10 6 x L 2 MC steps, and 
In / is halved after each stage. We then counted the percentage of the N trajectories that 
have spiked DOS after the tenth In / stage. 

The results are summarized in Table [T] For each L. we studied N = 8,16, 32 and T = 
250,1000, 5000, oo where T = oo means no swapping (i.e. original WLS). Each entry in 
the table shows the percentage of spiked DOS in the order L = 16/32. For T = oo, more 
than half of the N trajectories exhibits spiking. For each of the rest of the N-T entries, the 
percentage is averaged over five runs where each run uses a different seed for the random 
shuffler. As expected, large N (32) and small T (250) is the most effective in preventing spike 
formation. Between N and T, the latter is more important in preventing spikes. We found 
that is it important to keep T < 1000 because otherwise swapping is ineffective. Increasing 
N has a comparatively more gradual albeit still significant effect in preventing spikes. 

We checked the accuracy of the DOS of our swapping algorithm by computing the specific 
heat capacity per spin, c v , for L = 16. From the five runs of N = 32, T = 250, we took one 
run and continued the swapping simulation until In / « 5.96 x 10~ 8 [37]. The c v of all 32 
trajectories were calculated and averaged, giving (c v ). The value of In g(E) in any spiked bin 
is replaced by the average of its two neighboring bins. The same is repeated for T = oo, and 
results of replica-exchange Monte Carlo calculations are used as benchmark. The results are 
shown in Fig. |4](a). T = 250 and replica-exchange agree very well, even at low temperatures 
as shown in the inset. On the other hand, the ( c v ) of T = oo suffers from large fluctuations. 
We found that the trajectories of T = oo can be separated into two types: (i) Those with no 
spikes in their DOS give a ( c v ) that have successfully converged, as shown by the red (labelled 
‘converged’) curve in Fig. |4](b). (ii) Trajectories with spikes on their DOS sometimes spend 


entire or even a few In / stages confined either within the spiked bin itself or ‘bouncing’ 
between the spike and edge of the DOS. When the random walker managed to escape, the 
In/ has already been lowered so much that the current modification factor does not give 
effective improvement to the DOS, and thus the DOS is effectively frozen at an earlier In/ 
stage. The ( c v ) computed from these trajectories are therefore unconverged, as shown by 
the blue (labelled ‘unconverged’) curve in Fig. |4](b). Our observation also explains why 
the annealing method by Poulain et al. [6] was successful. By re-increasing the In /, their 
random walker is allowed to redo those earlier In / stages when it was trapped. 


VI. DISCUSSIONS 

As mentioned above, the phenomenon of trapping is highly dependent on how the sim¬ 
ulation is being set up, i.e. the kind of system, the energy binnings, the MC step size etc. 
We would like to highlight two scenarios where we think there is an increased likelihood 
of encountering trapping. The first is when it is preferable to use a small MC step size to 
achieve a good acceptance rate. In our study of the poly-alanine molecule, we found that 
the molecule sometimes fold into very tight conformations where small MC step sizes are 
necessary to enable the molecule to gently unfold (large moves will give rise to the ‘lever- 
arm’ effect resulting in steric clashes). If small step sizes are used, then one needs to be 
aware of the trapping mechanism we have discussed above. 

The second scenario is when one is interested in low temperature behavior, which in 
WLS means sampling from low-lying energy states. To illustrate, we ran 200 independent, 
non-swapping trajectories for the XY model (L = 16) discussed above. At the end of the 
15th In / stage, we checked if there is a spike in their DOS, and if yes, the energy bin at 
which the spike occurs. The energy distribution of the spike locations is shown in Fig. [5j 
Spiking occurs predominantly at an energy range near to the ground state at if ~ —346. 
This energy range corresponds roughly to the temperature T = 0.01. In the inset of Fig. 
[HJ the energy probability distribution P(E ) = g(E)e~ E ^ T at T = 0.01 is shown, and we see 
that the distribution is supported on an energy region where spiking is dominant. Hence, 
one should be aware of the possiblity of trapping when one investigates low temperature 
phenomena. 

On the other hand, trapping might not be so important when one studies high ternper- 
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ature behavior. We use the same XY model as example. From Fig. [IJ we see that the 
transition temperature is around T = 0.4. In the inset of Fig. [5j we plot the P(E) at 
T = 0.4. It is seen that the distribution lies quite far away from the energy region where 
spiking occurs. Hence, one might impose a low energy threshold to prevent the random 
walker from visiting low energy states if one is only interested in obtaining the transition 
temperature. 

In our algorithm, swapping between trajectories occur with probability 1. As mentioned, 
this violates detailed balance. One way to rectify this is to treat the swapping as a MC step 
by itself. The acceptance probability for the swap has been proposed by Vogel et al. [T3] . 


^accept = min 


r 9i(E\Y}) Sj (E[X]) 


(5) 


where X and Y are the configurations of trajectories i and j before the swap, and E[X] and 
E[Y] are their energies, respectively. gi(E[X]) is the DOS of trajectory i at energy E[X], 
Hence, every few steps, a swap can be attempted and accepted according to Eq. (j5|. This 
way, detailed balance is obeyed and the swapping interval T can be made quite small. 

If the system under study is small enough for many multiple trajectories to be simu¬ 
lated on a single processor, one should indeed adopt Eq. (|5]). However, for large systems 
where one assigns different trajectories to different processors, one need to take into account 
the computational overhead coming from the communication among the processors during 
swapping. It may then be too expensive to use Eq. (J5]) to swap since frequent swapping 
will slow the computation down. On the rare occasion when a swap is attempted, it is best 
to let the swap succeed as we have done in this paper. Although theoretically less rigorous, 
our method is faster. Bouzida et al. has also shown within the context of the acceptance 
ratio method that occasional and infrequent violation of detailed balanced does not incur 
serious error [38j. Our calculation of the specific heat capacity also shows that there is no 
discernible error incurred from this violation of detailed balance during swapping. 

Lastly, we discuss the case of piecewise constant DOS alluded to earlier. The original 
WLS algorithm takes the value of In g(E) as constant within each energy bin. This will not 
give rise to any spiking, as without linear-interpolation the mechanism of Fig. [2] no longer 
holds. However, for piecewise constant DOS the energy bins at the very edge of the DOS 
will usually only be visited after the first few In / stages have been completed. By the time 
the random walker drops into these bins, the difference in In g(E) between these bins and 
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the neighboring, higher-energy ones is already so large that exp[A In g(E)\ is effectively zero, 
and the random walker stays a long time in these bins to let the DOS ‘catch up’ before it 
can escape. This leads to another form of trapping. The swapping algorithm presented here 
is also applicable to this problem encountered by piecewise constant DOS. 

We thank P. Poulain for useful discussions. 
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FIG. 1. (Color online) Spikes on the DOS for (a) the frustrated XY model (L = 16) and (b) the 
8-mer poly-alanine. For both, we performed ten stages of WLS where each stage is simulated for 
5 x 10 6 updates per spin/angle, and the modification factor In / is halved at at each stage. Shown 
are the DOS’s from a single trajectory at the end of the tenth stage. 

In 9(E) 



FIG. 2. Schematic diagram illustrating how the random walker gets trapped inside a spiked bin. 
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FIG. 3. (Color online) Comparing the behaviors between a trapped and normal segments of 

trajectory for the XY model (L = 16). (a) Autocorrelation function and infinity norm of force 

along the segments, (b) Scatter plot of proposed energy change A E versus proposed angle change 

A6. (c) Energy distribution of accepted moves. 
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Temperature 

FIG. 4. (Color online) (a) Specific heat capacity per spin of the XY model for a fixed set of Jij 
(L = 16). The (c v ) for T = 250 and T = oo is calculated by averaging over 32 trajectories. For 
replica-exchange Monte Carlo (circles), ( c v ) is computed by averaging over 25 independent runs. 
Inset: Close-up view near zero temperature of T = 250 (filled circles) and replica-exchange (empty 
circles). Error bars indicate the standard deviation. The bars for replica-exchange are smaller than 
the circles, (b) For T = oo. Converged: (c v ), averaged over 14 converged trajectories. Unconverged: 
( c v ), averaged over 18 unconverged trajectories. Error bars indicate standard deviation. 
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FIG. 5. (Color online) Distribution of spike locations with respect to energy for frustrated XY 
model (L = 16). Inset: The energy probability distribution P(E) = g(E)e~ E / T at temperature 
T = 0.01 (corresponding to E ~ —346 where spiking is dominant) and at T = 0.4 (phase transition 
temperature). 
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